305 research outputs found
CrowdRec: 3D Crowd Reconstruction from Single Color Images
This is a technical report for the GigaCrowd challenge. Reconstructing 3D
crowds from monocular images is a challenging problem due to mutual occlusions,
server depth ambiguity, and complex spatial distribution. Since no large-scale
3D crowd dataset can be used to train a robust model, the current multi-person
mesh recovery methods can hardly achieve satisfactory performance in crowded
scenes. In this paper, we exploit the crowd features and propose a
crowd-constrained optimization to improve the common single-person method on
crowd images. To avoid scale variations, we first detect human bounding-boxes
and 2D poses from the original images with off-the-shelf detectors. Then, we
train a single-person mesh recovery network using existing in-the-wild image
datasets. To promote a more reasonable spatial distribution, we further propose
a crowd constraint to refine the single-person network parameters. With the
optimization, we can obtain accurate body poses and shapes with reasonable
absolute positions from a large-scale crowd image using a single-person
backbone. The code will be publicly available
at~\url{https://github.com/boycehbz/CrowdRec}.Comment: technical repor
Occluded Human Body Capture with Self-Supervised Spatial-Temporal Motion Prior
Although significant progress has been achieved on monocular maker-less human
motion capture in recent years, it is still hard for state-of-the-art methods
to obtain satisfactory results in occlusion scenarios. There are two main
reasons: the one is that the occluded motion capture is inherently ambiguous as
various 3D poses can map to the same 2D observations, which always results in
an unreliable estimation. The other is that no sufficient occluded human data
can be used for training a robust model. To address the obstacles, our key-idea
is to employ non-occluded human data to learn a joint-level spatial-temporal
motion prior for occluded human with a self-supervised strategy. To further
reduce the gap between synthetic and real occlusion data, we build the first 3D
occluded motion dataset~(OcMotion), which can be used for both training and
testing. We encode the motions in 2D maps and synthesize occlusions on
non-occluded data for the self-supervised training. A spatial-temporal layer is
then designed to learn joint-level correlations. The learned prior reduces the
ambiguities of occlusions and is robust to diverse occlusion types, which is
then adopted to assist the occluded human motion capture. Experimental results
show that our method can generate accurate and coherent human motions from
occluded videos with good generalization ability and runtime efficiency. The
dataset and code are publicly available at
\url{https://github.com/boycehbz/CHOMP}
Observation of the superconducting proximity effect in the surface state of SmB6 thin films
The proximity effect at the interface between a topological insulator (TI)
and a superconductor is predicted to give rise to chiral topological
superconductivity and Majorana fermion excitations. In most TIs studied to
date, however, the conducting bulk states have overwhelmed the transport
properties and precluded the investigation of the interplay of the topological
surface state and Cooper pairs. Here, we demonstrate the superconducting
proximity effect in the surface state of SmB6 thin films which display bulk
insulation at low temperatures. The Fermi velocity in the surface state deduced
from the proximity effect is found to be as large as 10^5 m/s, in good
agreement with the value obtained from a separate transport measurement. We
show that high transparency between the TI and a superconductor is crucial for
the proximity effect. The finding here opens the door to investigation of
exotic quantum phenomena using all-thin-film multilayers with high-transparency
interfaces
Synthesizing Physically Plausible Human Motions in 3D Scenes
Synthesizing physically plausible human motions in 3D scenes is a challenging
problem. Kinematics-based methods cannot avoid inherent artifacts (e.g.,
penetration and foot skating) due to the lack of physical constraints.
Meanwhile, existing physics-based methods cannot generalize to multi-object
scenarios since the policy trained with reinforcement learning has limited
modeling capacity. In this work, we present a framework that enables physically
simulated characters to perform long-term interaction tasks in diverse,
cluttered, and unseen scenes. The key idea is to decompose human-scene
interactions into two fundamental processes, Interacting and Navigating, which
motivates us to construct two reusable Controller, i.e., InterCon and NavCon.
Specifically, InterCon contains two complementary policies that enable
characters to enter and leave the interacting state (e.g., sitting on a chair
and getting up). To generate interaction with objects at different places, we
further design NavCon, a trajectory following policy, to keep characters'
locomotion in the free space of 3D scenes. Benefiting from the divide and
conquer strategy, we can train the policies in simple environments and
generalize to complex multi-object scenes. Experimental results demonstrate
that our framework can synthesize physically plausible long-term human motions
in complex 3D scenes. Code will be publicly released at
https://github.com/liangpan99/InterScene
Nonrigid Object Contact Estimation With Regional Unwrapping Transformer
Acquiring contact patterns between hands and nonrigid objects is a common
concern in the vision and robotics community. However, existing learning-based
methods focus more on contact with rigid ones from monocular images. When
adopting them for nonrigid contact, a major problem is that the existing
contact representation is restricted by the geometry of the object.
Consequently, contact neighborhoods are stored in an unordered manner and
contact features are difficult to align with image cues. At the core of our
approach lies a novel hand-object contact representation called RUPs (Region
Unwrapping Profiles), which unwrap the roughly estimated hand-object surfaces
as multiple high-resolution 2D regional profiles. The region grouping strategy
is consistent with the hand kinematic bone division because they are the
primitive initiators for a composite contact pattern. Based on this
representation, our Regional Unwrapping Transformer (RUFormer) learns the
correlation priors across regions from monocular inputs and predicts
corresponding contact and deformed transformations. Our experiments demonstrate
that the proposed framework can robustly estimate the deformed degrees and
deformed transformations, which makes it suitable for both nonrigid and rigid
contact.Comment: Accepted by ICCV202
Addressing the Accuracy-Cost Tradeoff in Material Property Prediction: A Teacher-Student Strategy
Deep learning has revolutionized the process of new material discovery, with
state-of-the-art models now able to predict material properties based solely on
chemical compositions, thus eliminating the necessity for material structures.
However, this cost-effective method has led to a trade-off in model accuracy.
Specifically, the accuracy of Chemical Composition-based Property Prediction
Models (CPMs) significantly lags behind that of Structure-based Property
Prediction Models (SPMs). To tackle this challenge, we propose an innovative
Teacher-Student (T-S) strategy, where a pre-trained SPM serves as the 'teacher'
to enhance the accuracy of the CPM. Leveraging the T-S strategy, T-S CrabNet
has risen to become the most accurate model among current CPMs. Initially, we
demonstrated the universality of this strategy. On the Materials Project (MP)
and Jarvis datasets, we validated the effectiveness of the T-S strategy in
boosting the accuracy of CPMs with two distinct network structures, namely
CrabNet and Roost. This led to CrabNet, under the guidance of the T-S strategy,
emerging as the most accurate model among the current CPMs. Moreover, this
strategy shows remarkable efficacy in small datasets. When predicting the
formation energy on a small MP dataset comprising merely 5% of the samples, the
T-S strategy boosted CrabNet's accuracy by 37.1%, exceeding the enhancement
effect of the T-S strategy on the whole dataset
Development and validation of machine-learning models for the difficulty of retroperitoneal laparoscopic adrenalectomy based on radiomics
ObjectiveThe aim is to construct machine learning (ML) prediction models for the difficulty of retroperitoneal laparoscopic adrenalectomy (RPLA) based on clinical and radiomic characteristics and to validate the models.MethodsPatients who had undergone RPLA at Shanxi Bethune Hospital between August 2014 and December 2020 were retrospectively gathered. They were then randomly split into a training set and a validation set, maintaining a ratio of 7:3. The model was constructed using the training set and validated using the validation set. Furthermore, a total of 117 patients were gathered between January and December 2021 to form a prospective set for validation. Radiomic features were extracted by drawing the region of interest using the 3D slicer image computing platform and Python. Key features were selected through LASSO, and the radiomics score (Rad-score) was calculated. Various ML models were constructed by combining Rad-score with clinical characteristics. The optimal models were selected based on precision, recall, the area under the curve, F1 score, calibration curve, receiver operating characteristic curve, and decision curve analysis in the training, validation, and prospective sets. Shapley Additive exPlanations (SHAP) was used to demonstrate the impact of each variable in the respective models.ResultsAfter comparing the performance of 7 ML models in the training, validation, and prospective sets, it was found that the RF model had a more stable predictive performance, while xGBoost can significantly benefit patients. According to SHAP, the variable importance of the two models is similar, and both can reflect that the Rad-score has the most significant impact. At the same time, clinical characteristics such as hemoglobin, age, body mass index, gender, and diabetes mellitus also influenced the difficulty.ConclusionThis study constructed ML models for predicting the difficulty of RPLA by combining clinical and radiomic characteristics. The models can help surgeons evaluate surgical difficulty, reduce risks, and improve patient benefits
- …